Erroneous correspondences between samples and their respective channel or target commonly arise in several real-world applications. For instance, whole-brain calcium imaging of freely moving organisms, multiple target tracking or multi-person contactless vital sign monitoring may be severely affected by mismatched sample-channel assignments. To systematically address this fundamental problem, we pose it as a signal reconstruction problem where we have lost correspondences between the samples and their respective channels. We show that under the assumption that the signals of interest admit a sparse representation over an overcomplete dictionary, unique signal recovery is possible. Our derivations reveal that the problem is equivalent to a structured unlabeled sensing problem without precise knowledge of the sensing matrix. Unfortunately, existing methods are neither robust to errors in the regressors nor do they exploit the structure of the problem. Therefore, we propose a novel robust two-step approach for the reconstruction of shuffled sparse signals. The performance and robustness of the proposed approach is illustrated in an application of whole-brain calcium imaging in computational neuroscience. The proposed framework can be generalized to sparse signal representations other than the ones considered in this work to be applied in a variety of real-world problems with imprecise measurement or channel assignment.
translated by 谷歌翻译
Tourette Syndrome (TS) is a behavior disorder that onsets in childhood and is characterized by the expression of involuntary movements and sounds commonly referred to as tics. Behavioral therapy is the first-line treatment for patients with TS, and it helps patients raise awareness about tic occurrence as well as develop tic inhibition strategies. However, the limited availability of therapists and the difficulties for in-home follow up work limits its effectiveness. An automatic tic detection system that is easy to deploy could alleviate the difficulties of home-therapy by providing feedback to the patients while exercising tic awareness. In this work, we propose a novel architecture (T-Net) for automatic tic detection and classification from untrimmed videos. T-Net combines temporal detection and segmentation and operates on features that are interpretable to a clinician. We compare T-Net to several state-of-the-art systems working on deep features extracted from the raw videos and T-Net achieves comparable performance in terms of average precision while relying on interpretable features needed in clinical practice.
translated by 谷歌翻译
Breast cancer is one of the most common cancer in women around the world. For diagnosis, pathologists evaluate biomarkers such as HER2 protein using immunohistochemistry over tissue extracted by a biopsy. Through microscopic inspection, this assessment estimates the intensity and integrity of the membrane cells' staining and scores the sample as 0, 1+, 2+, or 3+: a subjective decision that depends on the interpretation of the pathologist. This paper presents the preliminary data analysis of the annotations of three pathologists over the same set of samples obtained using 20x magnification and including $1,252$ non-overlapping biopsy patches. We evaluate the intra- and inter-expert variability achieving substantial and moderate agreement, respectively, according to Fleiss' Kappa coefficient, as a previous stage towards a generation of a HER2 breast cancer biopsy gold-standard using supervised learning from multiple pathologist annotations.
translated by 谷歌翻译
在本文中,我们解决了包含人脸和声音的视频中的唇彩同步问题。我们的方法是基于确定视频中的嘴唇运动和声音是否同步,具体取决于其视听对应得分。我们提出了一个基于视听的跨模式变压器模型,该模型在标准的唇读语音基准数据集LRS2上胜过音频视频同步任务中的几个基线模型。尽管现有的方法主要集中在语音视频中的唇部同步上,但我们也考虑了歌声的特殊情况。由于持续的元音声音,唱歌声音是同步的更具挑战性的用例。我们还研究了在唱歌语音的背景下在语音数据集中训练的LIP同步模型的相关性。最后,我们使用在唱歌语音分离任务中通过唇部同步模型学到的冷冻视觉特征,以优于训练有素的端到端的基线音频视觉模型。演示,源代码和预训练的模型可在https://ipcv.github.io/vocalist/上找到。
translated by 谷歌翻译
本文提出了一种语音分离的视听方法,在两种情况下以低潜伏期产生最先进的结果:语音和唱歌声音。该模型基于两个阶段网络。运动提示是通过轻巧的图形卷积网络获得的,该网络处理面对地标。然后,将音频和运动功能馈送到视听变压器中,该变压器对隔离目标源产生相当好的估计。在第二阶段,仅使用音频网络增强了主导语音。我们提出了不同的消融研究和与最新方法的比较。最后,我们探讨了在演唱语音分离的任务中训练训练语音分离的模型的可传递性。https://ipcv.github.io/vovit/可用演示,代码和权重
translated by 谷歌翻译
这项工作探讨了CFGAN的再现性。 CFGan及其模型(Tagrec,MTPR和CRGAN)学会通过使用先前的交互来为TOP-N建议者生成个性化和假的偏好排名。这项工作成功复制了原始纸张中发布的结果,并讨论了CFGAN框架与原始评估中使用的模型之间的某些差异的影响。没有随机噪声和使用真实用户配置文件作为条件向量离开发电机容易发生一个退化的解决方案,其中输出矢量与输入向量相同,因此,表现为简单的AutoEncoder。该工作进一步扩展了比较CFGAN对一系列简单且众所周知的适当优化的基线的实验分析,尽管计算成本高,但仍观察CFGAN并不一致地对抗它们。为确保这些分析的再现性,这项工作描述了实验方法,并发布了所有数据集和源代码。
translated by 谷歌翻译
We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds.
translated by 谷歌翻译